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An efficient surveillance system is a crucial factor in identifying, monitoring 
and tackling outbreaks of infectious diseases. Scarcity of data and limited 
amounts of economic resources require a targeted effort from public health 
authorities. In this paper, we propose a mathematical method to identify 
areas where surveillance is critical and low reporting rates might leave epi- 
demics undetected. Our approach combines the use of reference-based 
susceptible -exposed -infectious models and observed reporting data; We pro- 
pose two different specifications, for constant and time-varying surveillance, 
respectively. Our case study is centred around the spread of the raccoon 
rabies epidemic in the state of New York, using data collected between 1990 
and 2007. Both methods offer a feasible solution to analyse and identify 
areas of intervention. 
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1. Introduction 

As pointed out in Microbial threats to health: emergence, detection and response [1], the 
degree of success of global and national efforts to create public health infrastructure 
with effective systems of surveillance and response is a key variable influencing the 
future impact of infectious diseases. According to WHO, surveillance is an ongoing, 
systematic collection, analysis and interpretation of health-related data essential to plan- 
ning, implementation and evaluation of public health practice (http://www.who.int/ 
immunization_monitoring/burden/ routine_surveillance / en/ index.html) . Sur- 
veillance plays a major role in devising public health strategies to curtail the 
spread of infectious diseases and early detection remains the first line of defence 
in preventing the emergence of novel disease outbreaks. Often, surveillance is 
the decisive factor in triggering early intervention [2,3], in order to avoid the 
higher public health costs associated with a widespread infection in the case an 
outbreak has gone undetected. 

The definition of an epidemic /epizootic or outbreak is varied and has a 
long history of confusion (see Rosenburg [4] for an account of the history of 
the concept of an epidemic). Contemporary discussions have assumed at 
least two definitions of epidemic or outbreak occurrence. Childs et al. [5], for 
example, consider a rabies outbreak as occurring when the observed number 
of cases falls above a baseline for a specified number of consecutive observation 
periods and where the average number of cases in a given location determines 
the base line. They suggest an above-average reported rate at the county level 
for three consecutive months. The other most common definition treats any 
occurrence of an infectious disease as an outbreak, where it is detected in a 
novel geographical location and poses a significant public health threat, because 
of its novel appearance in that location. Throughout this paper, we adhere to 
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this latter definition since we are concerned with uncovering 
appropriate surveillance strategies for detecting novel occur- 
rences of disease. 

Resources for infectious disease surveillance are always 
in limited supply and any strategy that provides insights 
into the optimal guidance of surveillance programmes is 
a valued addition to our public health infrastructure [6]. 
Guidance strategies should include the identification of both 
areas and populations that are at increased risk of disease 
exposure. This is the key idea associated with the concept of 
targeted surveillance (also known as risk-based surveillance) 
defined as a surveillance strategy that focuses sampling on 
high-risk populations in which specific and commonly 
known risk factors exist [7]. The concept of targeted surveil- 
lance was first formally introduced following the emergence 
of bovine spongiform encephalopathy (BSE) in the UK 
during the 1996 epidemic [8]. This idea is also behind the 
recently emerging field of model-guided surveillance [9]. 

In the USA, the Council of State and Territorial Epidemiol- 
ogists, in collaboration with the Center for Disease Control 
(CDC), maintains a list of notifiable diseases constituting the 
National Notifiable Diseases Surveillance System. For human 
diseases, healthcare providers are an essential component of 
any surveillance programme, but their impact is significantly 
reduced when confronted with an epidemic of zoonotic 
origin. Monitoring of wildlife reservoirs is an essential com- 
ponent of detection but rarely undertaken routinely. What 
we understand of zoonotic epidemics is largely constructed 
from passive reporting of occurrences gleaned from haphazard 
and incomplete surveillance of animal populations usually as 
the result of an animal-human interaction [10]. For the pur- 
poses of our analysis, the reporting rate (or equivalently the 
detection rate) is taken to constitute the fraction of reported 
cases over the total number of infections. Reporting rates 
vary significantly over both time and space and may deviate 
significantly from the true underlying distribution of infections 
due to a variety of sources (e.g. variation in the size and extent 
of infection clusters, heterogeneity in human and host popu- 
lation densities, etc. [11]). However, these factors explain 
only partially the spatial and temporal heterogeneity in report- 
ing rate. Variation in the implementation and structure of 
surveillance programmes can themselves be a significant 
source of reporting rate variation and a mapping of different 
levels of reporting rate and surveillance efforts across space 
or time can help identify specific areas in need of intervention. 

A variety of mathematical models are available in the litera- 
ture to describe the dynamics of infectious diseases using the 
generalized susceptible -exposed -infectious -removed (SEIR) 
modelling structure (see, for instance, [12,13] or, for a spatially 
continuous model, [14]), and some work has been done at 
estimating the reporting rates for some human diseases confer- 
ing lifelong immunity [15,16], but little effort has been directed 
at elucidating how to incorporate reporting data into models of 
surveillance [17], especially from an ecological viewpoint [18]. 
The goal of this paper is to show how to use reporting data 
(both reports of positive and negative occurrences) to identify 
geographical areas where surveillance levels are potentially 
insufficient to detect outbreaks. 

Our approach is intended to provide a useful tool for public 
health agents, who monitor critical areas for surveillance and 
allocate funds for increased intervention. We introduce two 
different methods depending on whether agents have fixed 
or time-varying reporting rate data. The first method is based 



on a simple, constant reporting rate, intended to model a 
constant level of surveillance over time. Considering that 
surveillance levels usually change as a consequence of case 
detection and local public health concerns, we relax this 
assumption in our second method, where we formulate a 
reporting rate that changes over time and depends on the 
total number of reports (positive and negative) and the esti- 
mated host population. Provided that such an estimate is 
moderately accurate at any given time, it is possible to track dis- 
ease dynamics through a model for infectious spread. The first 
approach identifies a surveillance risk, whereas the second one 
identifies a surveillance efficacy. The concepts are not mutually 
exclusive, and the observed correlation between our results 
from the two approaches supports their mutual consistency. 
As a consequence, either method can be used to identify 
areas where surveillance levels are critical, possibly underas- 
sessed and potentially leaving an outbreak unidentified. 
Such evaluation relies on comparing the values of computable 
parameters (risk or efficacy) across different counties. From the 
public health standpoint, the areas identified by the method as 
at risk are the ones where additional resources should be allo- 
cated for targeted monitoring. The proposed models provide 
input for explicit assessment of which counties need active 
intervention by public health decision-makers. 

The approach we introduce combines process-driven and 
observational methods. It is quite general and suitable to a 
wide range of infectious disease systems and datasets. More- 
over, it has great potential for application to human diseases. 
The approach relies on good estimates of the population size 
and a good knowledge of the epidemiology of the disease. 
Both aspects are crucial, and often poorly specified. In the 
case of human diseases, the knowledge of the susceptible 
population and mitigated uncertainty about the epidemiolo- 
gical parameters of the disease would significantly increase 
the accuracy of the method. The model then can serve as 
a basis to improve surveillance strategies, particularly in 
disadvantaged regions. 

For illustrative purposes, we apply our method specifically 
to the spread of raccoon rabies virus among its raccoon 
(Procyon lotor) hosts in the state of New York. Rabies, a viral 
encephalomyelitis specific to mammals, has been a CDC noti- 
fiable disease since the mid-1970s. Rabies has the longest 
extant record of reports of any zoonotic disease in the USA. 
Rabies virus is transmitted from one animal to another usually 
by a bite [19,20]. Because its transmission modality is favour- 
able to interspecies infection, including human beings, rabies 
is a major public health concern. Raccoons are the major terres- 
trial vector of the disease in the eastern USA, though many 
foxes, bats and skunks carry the disease as well [10,21]. The 
potential risks to humans coupled with an extensive database 
with high geographical resolution, exact occurrence dates and 
knowledge of the species of host involved engenders the appli- 
cation particularly relevant and amenable to testing our 
methods and approach. 

2. Material and methods 
2.1. Model 

We consider the dynamics of a lethal disease, as described by a 
compartmentalized model of susceptible -exposed -infectious 
(SEI) type. The model subdivides the population into suscep- 
tible, exposed (hosts that have been exposed to the virus but 



not yet infectious) and infectious (host with the capability of 
transmitting the pathogen). The spatial resolution of the 
model is set at regional level (from township to state). Conse- 
quently, the computational model consists of a system of ODEs 

S' =aA — bNS - piS, 
E' = pIS - bNE - aE, 
V = oE- al, 
A = S + E 
and N = S + E + 1 

completed by suitable initial conditions. In the above equations, 
we denote by ft the transmission of pathogen by contact 
between a susceptible and an infectious individual, by v the 
vaccination rate, by a the reciprocal of the latency period, by 
a the reciprocal of the life expectancy of an infectious host. 

We assume a density-dependent mortality rate in the 
absence of the disease, bN. We denote by a the reproduction 
rate, which represents a yearly average, to take into account 
the reduced fecundity of juveniles. Seasonality is not explicitly 
included here, but could easily be with a time-dependent 
reproduction rate [22]. Moreover, we assume that only suscep- 
tible and exposed individuals are able to reproduce. Such an 
assumption is reasonable for a very aggressive disease in wild- 
life, assuming the expected survival of an infectious host much 
too short to give birth or care of the offspring. To show the 
dynamics of the epidemic model, we ran a simulation of the 
SEI model within one single virtual region. We report in 
table 1 the model parameter values that are adapted to raccoon 
rabies for the eastern USA and have either been drawn from 
published values and US Department of Agriculture sources 
(http://www.usda.org) or estimated indirectly. In particular, 
the birth rate a, the transmission rate /3, the latency period 
1 / a and the infectious period 1 / a are taken from literature 
[25-29]. The rate of density-dependent mortality, b, is 
estimated indirectly to produce a disease-free equilibrium 
of 27000 individuals, corresponding to a density of 11 ani- 
mals per km 2 (average for raccoons in the eastern USA 
[23]) in a region of 2457 km 2 (average size of a New York 
county, outside the five boroughs of New York city). We 
simulate 922 weeks of epizootic. The plot of the temporal 
dynamics of the full SEI model (susceptible, exposed, infec- 
tious and total population) is available in the electronic 
supplementary material. 

In order to simplify the dynamics of the SEI system, we 
aggregate the model to a planar system in terms of the infectious 
individuals I and the total population N. Since A = N — I, 
by summing up the first three equations in the model, we get 

N f =aN-(a + a)I - bN(N - 1) 

and 

I 1 = aE — al. 

A fourth class of removed could be included in a more 
general model, consisting of hosts that recovered from the 
disease or have been vaccinated. Since there is no evidence 
for natural recovery in rabies, which is our case study in 
this paper, and we do not consider vaccination at this level, 
the removed class is not considered. However, the following 
results are based on an aggregated method, and the use of a 
SEIR model would not affect the conclusions. 



Table 1. Coefficients of the SEI model. The natural death rate is chosen to 
be density dependent to provide a carrying capacity compatible with the 
published values in literature of 5-17 animals per km 2 [23,24]. The birth 
rate a, the transmission rate /3, the latency period 1/cr and the infectious 
period Ma are taken from [25-29]. 



a 


birth rate 


2.67 k/f/y 


Mo 


natural death rate 


variable 


P 


contact rate 


1x10" 4 (a d)" 1 


Ma 


latency period 


50 days 


Ma 


infectious period 


14 days 



2.2. Main features of the aggregated model 

The aggregated model is not in closed form due to the pres- 
ence in the second equation of the term aE. However, the 
knowledge of the new infectious aE temporal dynamics is 
sufficient to reproduce the dynamics of the full SEI model 
by means of the aggregated one. If the new infectious are 
known as function of time, their dynamics can be considered 
a source term <P in the second equation of the aggregated 
model, which can be written in the more general form 

N' = aN-(a + a)I - bN(N - 1) 

and 

I' = -aI+<P. 

To support our claim, we ran a simulation of the reduced 
model using as a source term in the second equation the 
temporal dynamics of the new infectious aE, obtained by 
simulating the full SEI. We compare in figure 1 its dynamics 
with those of the aggregated model. We plot the dynamics 
of both the total population (figure la) and the number infec- 
tious (figure lb). In both pictures, the dashed line represents 
the values obtained with the full SEI model, whereas the cir- 
cles represent the values obtained with the aggregated 
model. The numerical results confirm that the knowledge of 
the temporal dynamics of the new infectious aE is sufficient 
to reproduce the SEI dynamics with the aggregated model. 

A direct stability analysis for the aggregate model is not 
feasible. However, we can identify the N-nullcline, namely 
the set of points in the plane (N,J), where N' = 0, that is 
shown in figure 2a. If the number of infectious is constant, 
the upper branch of the nullcline is stable, whereas the 
lower branch is unstable. Moreover, as expected, the persist- 
ence of infectious hosts (i.e. an endemic state) reduces the 
carrying capacity of the host. 

Different temporal dynamics of the new infectious <P 
entail complex behaviours of the system in terms of epidemic 
outbreak, including persistency and possible extinction of the 
host population. We simulated different temporal dynamics 
by rescaling the new infectious from the full SEI, as 0 = 
£x {aE), with £=0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2. The 
resulting trajectories in the phase plane (N,J) are plotted in 
figure 2b. If the growth rate of the newly infectious hosts 0 
is too large, the population goes extinct along the bisector 
of the phase plane N = I (note the different scales on the 
axes). Otherwise, the trajectories show different levels of 
population drops in epidemic outbreaks, and a recovery 
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Figure 1. Comparison between the temporal dynamics of [a) total population and (b) the infectious for the complete SEIR (dashed line) and the aggregated model 
(N, I) (bullets). 
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Figure 2. (a) /V-nullcline and stability for constant values of infectious: the upper branch of the curve is stable, whereas the lower one is unstable, (b) Trajectories in 
the phase plane (/V,/) associated with different temporal dynamics of the new infectious (P = £ x (of), with £ = 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, and of 
from the complete SEI model. In red, we highlight the trajectory asociated with £=1, corresponding to the one of the complete SEI model. 



process towards the stable endemic equilibrium on the upper 
branch of the N-nullcline. 



2.3. Modelling detection rate for surveillance 

Effective surveillance within a region amounts to the ability 
to identify newly infectious individuals. In the SEI model, 
this amounts to the correct assessment of aE, and to estimate 
the surveillance levels in the different counties, we need an 
accurate evaluation of this value. However, this value is 
unknown. We propose to extrapolate the value aE from the 
available data in a given observational window, whose 
length we denote by t. Specifically, we consider the reported 
positive and negative cases. We denote by r + (t) and t-{t) the 
reported positive and negative cases at time t, respectively, 
and the total amount of reports (positive and negative) 
along the observation window I t =[t — t, t ] are given by 

{sEJ f |r + (s)/0} {sEl t |r_(s)/0} 



Note that the istantaneous reports r + (t) and r_(£) are 0 for 
most times t, according to the reporting frequency of the 
public health departments. In what follows, the dependency 
on time will be left out. 

We introduce a suitable function of the available reports 
that we denote with F(R +/ R-), whose role is to expand 
the actual number of reported cases to take into account the 
effectiveness of the surveillance procedure, leading to the 
extrapolation model 

N f = aN-(a + a)I - bN(N - 1) 

and 

I' = -aI + F(R + ,R-). 

2.4. Compatibility of the extrapolation functions 

The extrapolation function F(R +/ R-) has to satisfy any com- 
patibility requirements arising from the disease dynamics 
under consideration. Our case study in this paper concerns 
raccoon rabies, which is a lethal disease for the host, killing 



an infected animal within two weeks from the emergence of 
symptoms. For a lethal disease, the total population drop 
(namely the percentage of animals killed by the first out- 
break) is known to be related to the basic reproductive rate 
R 0 associated with the disease [30], and can be used as a com- 
patibility constraint. We would like to observe that estimating 
the population drop with this method is not needed for most 
human diseases, as public health data regarding the number 
of deaths are usually available. 

For the SEI model introduced earlier, the basic reproductive 
rate is given by 

a f3 



a+bN a 



N, 



while the expected population drop [30] is 

The reported values in literature for raccoon rabies R 0 lie 
between 1.2 and 1.4 [26]. As a consequence, a population 
drop between 16% and 28% can be used as a reliable compatibil- 
ity constraint for the system 

N' = aN-(a + a)I - bN(N - I) 



and 



I' = -aI + F(R + ,R_ 



2.5. Modelling extrapolation 

We propose here two different extrapolation functions to 
model surveillance efficacy that depend upon a family of 
parameters. The first models a constant level of surveillance, 
whereas the second models dynamic surveillance over time. 
We base our analysis on the assumption that an outbreak 
actually occurred in every area featuring positive reports. 

2.5.1. Constant surveillance 

Constant surveillance in time is modelled using only the positive 
reports R + , together with a linear extrapolation function 



V 

7 



In the above expression, y is the reporting rate, namely the 
percentage of new rabid cases that are actually detected. 

Reporting activity varies in space and is also known to be 
correlated with the population density [10]. In order to iden- 
tify the local surveillance efficacy for a given area, we express 
y in terms of the human population density of the area (h) 

-l 



K 



1+- 



This choice models an increase in the reporting rate with the 
human density: in particular, if h is zero then y vanishes, and 
as h increases y approaches 1 (that is, in the case where 
human population density is infinite, every new infectious 
case would be detected). The positive parameter K is a risk 
index: the larger its value, the lower the reporting rate for a 
given human population density. 

Knowing the initial population in a given area, we can 
identify the parameter y fulfilling the compatibility require- 
ments on the extrapolation function. In order to assess the 
level of surveillance in the region, we choose the corresponding 
risk index K. 



We iterate the procedure over all the areas of interest and 
identify the corresponding values for y. This procedure 
clearly depends on the epidemic under study. To eliminate 
such dependence, we normalize the risk index to a scale 
from 1 to 10, where a small value indicates a high level of sur- 
veillance in the region, whereas a large value entails a 
significant risk of an outbreak to go undetected in the area. 



2.5.2. Dynamic surveillance 

Dynamic surveillance in time is modelled by using both posi- 
tive R + and negative R- reports, combined through a 
nonlinear extrapolation function 



F dyn (R + ,R_ 



N 



R+ + R- 



1/6 



R+, 



where 6 > 1 is a parameter that represents the surveillance effi- 
cacy. The choice of the function F dyn (R + , R-) relies on two 
assumptions. First, we want a change in a small number of 
total reports to be more significant than a change in a larger 
number (a concept similar to diminishing returns in econ- 
omics). Then, we assume that the testing procedure has 
sensitivity 1 (that is, if we could test all individuals we 
would be able to identify all the new infectious cases) and 
specificity 1 (we have no false positives). As a consequence, 
the function depends also on the total population N. 

Also in this case, knowing the initial population, we can 
identify the parameter 0 fulfilling the compatibility requirements 
on the extrapolation function. We iterate the procedure over all 
the areas of interest and identify the corresponding values 
for 6. In this case, a large value of 9 indicates a high level of sur- 
veillance in the area, whereas a small value of 6 highlights a 
significant risk that an outbreak can go undetected in the region. 



2.6. New York State epidemiological data (1990-2007) 

On 4 May 1990, the first case of a rabid raccoon was recorded 
in the state of New York, in Addison Township, Steuben 
County, on the New York /Pennsylvania border, as part of 
an advancing wavefront of rabies spread. By the end of 
1994, the epizootic had propagated extensively across the 
state. The epizootic wave across NY was actually part of a 
larger epizootic that began at the boundary between Virginia 
and West Virginia in the mid-1970s and spread northeast 
through Pennsylvania and Connecticut and southeast to 
North Carolina [5], but entering NY in 1990. 

At the time of the outbreak, rabies posed a particularly 
pressing public health problem with the number of post- 
exposure prophylactic treatments increasing from around 70 
before the outbreak to over 1200 by 1991 [31]. Consequently, 
intensive surveillance and monitoring of wildlife populations 
was undertaken by the state and continues today. An extensive 
database has been collected by the New York State Department 
of Health. Each entry was recorded at the township level (754 
locations) from 1990 to the present. The data we use in our 
analysis are those positive and negative cases verified by the 
New York State Department of Health from 1990 to 2007. 

We aggregated the data at the county level, at which sur- 
veillance and intervention policies are actually implemented. 
Table 2 collects the 56 counties that featured reported cases 
of rabid raccoons in the period 1990-2007, their human 
population density and the total positive cases. Figure 3 
illustrates the progression of the epidemic across the state at 



Table 2. New York State epidemiological data (1990-2007). Counties, area, human population densities and total reported rabid cases from 1990 to 2007. 
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four different times, in terms of total reported cases at the 
county level. 



2.7. Estimate of the raccoon population 

One of the major limitations in studying wildlife epidemics is 
the difficulty in establishing the actual size of the at-risk 
population under investigation. Best estimates from the litera- 
ture suggest that raccoon density in the eastern USA falls in 
the range of 5-17 animals per km 2 [23,24]. 

We consider in this study all 56 counties (table 2) that 
featured reported cases of rabid raccoons in the period 1990- 
2007. We mitigate the uncertainty about the actual raccoon 
population size by drawing, for each county, 50 values from 
a normal distribution with mean 11 and s.d. 2 (in order to 
cover the variability among the different ranges in the litera- 
ture, see [23,24] and references therein). We add a correction 
to this distribution by taking into account the human popu- 
lation density: according to the New York State Department 
of Environmental Conservation (http://www.dec.ny.gov/ani- 
mals/9358.html), raccoons are more prone to establish in 
areas where the human presence is higher. Suburban/ 



metropolitan areas are often associated with the highest 
recorded raccoon population densities. We thus added an 
extra term to the counties with human density above the aver- 
age for the state (157.81 individuals per km 2 ), by adding draws 
from a normal distribution with mean 0.3 Vh (h being the 
human density for the zth county) and s.d. 12. The concerned 
counties are Albany, Erie, Monroe, Nassau, Niagara, Rockland, 
Schenectady, Suffolk and Westchester. We plot in figure 4 the 
minimal (figure 4a) and maximal (figure 4b) initial populations 
stochastically generated by the procedure described earlier, 
and we report in table 3 the corresponding values. 



2.8. Model simulation and risk identification 

We ran simulations of the aggregated system with extrap- 
olation from the data for all 56 counties with the 50 values 
of the initial population described earlier. The reports' behav- 
iour along time seems to suggest the presence of an epidemic 
in almost all counties featuring positive reports, with the 
exception of Clinton, Hamilton, Suffolk and Warren, where 
the scarcity of reports does not allow us to draw evidence. 
The results for these counties have thus to be considered 




Figure 3. Total reported cases aggregated by county at different times, (a) 6 May 1991, (b) 30 November 1992, (c) 21 June 1994 and (d) 31 December 2007. 
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Figure 4. (a) Minimal and (b) maximal initial population stochastically generated in the 56 counties included in the study. 



with care. We assumed that at the beginning of the epizootic 
the host population is entirely susceptible and at equilibrium, 
and that an epidemic has actually taken place in the counties 
included in the study. As a consequence, a drop in the popu- 
lation occurred that was compatible with the epizootic of the 
disease. We tested both the static and the dynamic model 



approaches, by running the SEI model with sampled values 
for y and 6. Different values for y and 0 produce different tem- 
poral dynamics for the total population and different 
population drops (figure 5b r d)- We sampled values of y 
between 0 and 1, and values of 0 between 1 and 7. For all 56 
counties, we identify for all values of the initial population, 



Table 3. New York State: estimate of raccoon population. Minimal and maximal initial raccoon population for the counties included in the study. 
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the ranges of the parameters that produce population drops 
between 16% and 28% during the first outbreak. 

Knowing the initial population, we can then assess the 
level of risk for each county (labelled by i = 1, . . . , 56) for 
the constant surveillance model. We choose the risk index 
K l m , obtained algebraically from the midpoint of the compat- 
ibility interval for y. If the compatible values of y for the zth 
county lie in the interval 71 = (y^ in , y^ax)' * ne corresponding 
risk index is given by K l m = /z z /y^(l — y^), where hi is the 
human population of the county, and y^ is the midpoint of 
the interval rj. The procedure clearly depends on the epi- 
demic under study. In order to eliminate such dependence, 
we normalize the risk index to a scale from 1 to 10. Hence, 
we introduce for the zth county a surveillance risk p ir which 
is defined as the natural logarithm of K l m weighted by its 
maximum over all counties. The corresponding surveillance 
risk for th zth county is then given by 



Pi 



10 x 



max log(Xj r 



where a small value of p z indicates a high level of surveillance 
in the county, whereas a large value of p z entails a significant 
risk of an epidemic going undetected in the area. 

In a similar manner, we can assess the surveillance effi- 
cacy for the dynamic surveillance model. In this case, we 
consider as indicator for the surveillance efficacy in the zth 
county, the value of 0,- corresponding to the midpoint of the 
interval associated with the initial population. A large 
value of Si indicates a high level of surveillance in the area, 
whereas a small value of Si highlights a significant risk that 
an epidemic will go undetected in the region. 



Finally, the values of p z and Si can be plotted on a geo- 
graphical map to get a comprehensive view of the global 
risk across the state. 



3. Results 

Detailed results are shown for Albany County. This county has 
a very high count of reports, probably associated with the pres- 
ence of the rabies diagnostic laboratory of the Wadsworth 
Center (New York State Health Department). We would like 
to observe that the presence of this large facility might 
induce bias in the estimated surveillance risk for the neighbour- 
ing counties. However, the observed disease dynamics are not 
different from what was observed in the majority of other 
counties. Figure 5a,c shows, respectively, for constant and 
dynamic surveillance, the curves obtained connecting the 
values of the parameters (y and 6) paired with the associated 
population drop. The dashed blue line corresponds to the 
lower bound for the initial raccoon population, whereas the 
red line corresponds to the upper bound. The intersections of 
the two curves with the horizontal lines at 16% and 28% 
drop locate the intervals, where the compatibility constraints 
are satisfied. For static surveillance, we have y €E (0.02,0.05) 
in the case we believe that the raccoon population is on the 
higher end of the estimate, and y (0.12,0.23) for the lower 
end. As we can see the lack of overlap between the compatibil- 
ity intervals associated with the minimal and maximal initial 
population implies that optimal surveillance levels can be 
potentially very different. The importance of an accurate esti- 
mate of the initial raccoon population is crucial. A similar 



1.0 



0.8 



0.6 



0.4 



0.2 



- min. population 

- max. population 



(b) 



28% drop 
16% drop 



60000 
50000 
40000 
30000 
20000 
10000 



0.1 0.2 0.3 0.4 0.5 0.6 
surveillance accuracy, j 



0.7 




1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 



id) 

60000 
50000 
I 40000 



Oh 

Sh 30000 



20000 
10000 



16% drop 
28% drop 



7=0.0109 
7=0.0206 
-7=0.0326 
-7=0.0944 



100 200 300 400 500 600 700 800 900 
time (weeks) 



16% drop 
28% drop 



0=1.5051 
0=1.9596 

■ 0=2.2626 

■ 0=3.2727 



0 100 200 300 400 500 600 700 800 900 
time (weeks) 



Figure 5. Albany county. (a,b) Constant surveillance, (c,d) dynamic surveillance, (a) Population drops in Albany county as a function of the surveillance accuracy y 
for the estimated minimal and maximal population, (b) Temporal dynamics of the total population for different surveillance accuracies, given a maximal level as 
initial condition, (c) Population drops in Albany county as a function of the surveillance efficacy 6 for the estimated minimal and maximal population, (d) Temporal 
dynamics of the total population for different surveillance efficacies, given a maximal level as initial condition. In [a-d], the horizontal dashed lines identify the 
range of population drop expected for raccoon rabies. 



argument can be drawn for 0 in the dynamic surveillance 
model, as shown in figure 5c. 

Figure 5b,d shows different time series for the total raccoon 
population associated with different surveillance scenarios. We 
believe that the outbreak that occurred in Albany County was 
typical and we expect disease dynamics consistent with the 
values of R 0 in the literature. For a population of roughly 
60 000 raccoons, we can observe the drop caused by the out- 
break, some damped oscillations and a slow recovery to the 
endemic equilibrium carrying capacity. 

In figure 6a, c, we have comprehensive plots for y and 6 
for the estimated intervals for all the 56 counties alphabeti- 
cally ordered. The same level of surveillance can produce 
completely different interpretation of the disease dynamics: 
for instance, a value y = 0.07 is associated with an outbreak 
so violent that it leads to extinction if the initial population 
is the minimal one, and at the same time with a complete 
absence of outbreak in the case where the initial population 
is the maximal one. Such a feature is shared by almost all 
the counties when a constant level of surveillance is assumed 
(figure 6a), with the exception of Clinton and Suffolk. In the 
case of dynamic surveillance, in contrast, only 12 counties do 
not feature an overlap between the intervals of 0 correspond- 
ing to minimal and maximal initial population (figure 6c). 
Moreover, among those 12, only two feature a significant 



gap comparable with the length of the smaller interval 
(Albany and Schenectady). 

Since the actual raccoon population is not known with abso- 
lute certainty, we choose to geographically map (figure 6b, d) the 
values of p z and 0 Z corresponding to the maximal estimated 
initial population. This is a conservative choice, justified by the 
consideration that the higher the population, the higher the 
risk (and relative consequences in terms of public health) of an 
undetected epidemic. 

Finally, a somewhat expected duality between the intrinsic 
surveillance risk p associated with the constant extrapolation 
and the surveillance efficacy 6 associated with the dynamic 
extrapolation is apparent, and can be assessed directly from 
the risk and efficacy mappings: areas with low surveillance 
risk display higher levels of surveillance efficacy. 



4. Discussion 

Surveillance is a key element in detecting, monitoring and 
studying infectious disease outbreaks over time and space. 
In this paper, we present some methodological aspects that 
can be used to evaluate the impact of localized surveillance 
for infectious diseases and help devising public health 
strategies. Intervention is based on information and the aim 




counties 



Figure 6. [a,c) Compatibility intervals for minimal and maximal initial raccoon population for the 56 New York counties with reported positive cases. Counties are 
numbered in alphabetical order matching table 1. (a) Constant surveillance, (c) dynamic surveillance. (b,d) Maps of surveillance risk and efficacy for the 56 
New York counties with reported positive cases, (b) Surveillance risk map associated with constant surveillance. Small values indicate an high level of surveillance in 
the region, whereas large values entail a significant risk of an epidemic going undetected in the area, (rf) Surveillance efficacy map associated with dynamic surveillance. 
Large values indicate a high level of surveillance in the area, whereas small values highlight a significant risk that an epidemics can go undetected in the region. 



of this paper is to provide some of the information to 
decision-makers. As an illustration to the methodology, 
we showed an example based on a real dataset, consisting 
of positive and negative reported cases of rabid raccoons 
in the state of New York over a period spanning from 1990 
to 2007. 

We introduce two methods, both based on the idea of com- 
bining process-driven models with an observational approach, 
to take advantage of the features of both. The first method is 
based on a simple, constant reporting/ detection rate, intended 
to model a constant level of surveillance over time. Consider- 
ing that surveillance levels usually change because of news 
effects and public health concerns over possible outbreaks 
[18], we relax this assumption in our second model, where 
we formulate a reporting/detection rate that changes over 
time and depends on the total number of reports (positive 
and negative) and the estimated host population. Provided 
that such an estimate is accurate at any given time, it is pos- 
sible to track disease dynamics through a model for disease 
spread [12]. With each of the two methods, we are able to 
identify locations where surveillance levels are critical and 
can potentially leave an outbreak unidentified. 

The first method identifies surveillance risk, whereas the 
second one identifies a surveillance efficacy. An expected 



negative correlation between risk and efficacy emerged 
(-0.5652384). Besides being intuitive, such correlation is 
actually a sign that the two approaches are consistent, and 
either one can be used to identify areas at greater risk to 
which resources should be allocated in priority. The dynamic 
surveillance method (which assesses surveillance efficacy) 
provides results that are less sensitive to the initial population 
size. This aspect is very promising in view of extending the 
approach presented here to human diseases, where accurate 
accounts of the total population, with high resolution in 
space and more stable self reporting rates are available. 

Two significant assumptions underlay our analyses. The 
first pertains to possible scenarios for the initial population 
size (before the first cases were recorded), and the second is 
that an epidemic actually occurred in each county where 
there was a positive reported case. We note that the first 
assumption is less limiting in the instance of human diseases. 
Since our study focuses on raccoon rabies, an a priori knowl- 
edge about the epidemiology of the disease is well known 
and established [29]. This is not a limiting aspect as long as 
the methodology is applied to extant diseases, but could 
prove problematic when applied to a newly emerging pathogen 
for which the epidemiology is not yet available. In this case, the 
method should be adapted by introducing some stochasticity in 



the key model parameters such as the transmission rate and the 
latency period. 

Our work has the potential to be extended at both the meth- 
odological and applied level. For instance, the raccoon rabies 
surveillance analysis can potentially benefit from the inclusion 
of information regarding vaccinations programmes. Oral 
Rabies Vaccination (ORV) was initiated during the collection 
of our data (J. E. Childs 2000, personal communication) and 
may have affected, for instance, Essex and Clinton counties as 
suggested by a slight decline in the number of reports in 
those counties after ORV establishment. Unfortunately, we do 
not know if these modest declines are due to ORV or simply 
the decline is cases as the epizootic moved through the 
county. Very little is known about the rate of transition of indi- 
viduals from the susceptible to the immune class through 
artificial immunization and we cannot, at this point, include 
such dynamics in our modelling. Investigating the efficacy of 
ORV programmes and verifying their eventual impact on dis- 
ease dynamics might help better understanding of targeted 
surveillance, although it is unclear whether the conclusions 
we reached in our work will be sensitive to this extension. 

Although uncertainty in outbreak size is taken into 
account by estimating system trajectories for different levels 
of R 0 [21] and of initial host population [23], the model can 
be further generalized by including randomness in some of 
the parameters. A current work in progress involves esti- 
mation of parameters in a full Bayesian hierarchical setting. 



Combining the information from previous studies (prior eli- 
citation) with the evidence arising from observational data 
(likelihood), we are able to produce estimates and uncertainty 
assessment for all the model parameters. This form of model- 
ling bears directly on our understanding of the underlying 
disease process. Nonetheless, however the results can be 
incorporated also into the surveillance setting. 

In future work, one could also estimate optimal levels of 
surveillance, by maximizing an utility function that depends 
on the social or environmental benefits of detecting an epi- 
demic and on a penalty term with the costs associated with 
implementing surveillance policies. Furthermore, writing a 
stochastic model, possibly with the introduction of a spatial 
dynamics not considered in the present work [14,32-34], 
will allow us to actually estimate parameters and optimal sur- 
veillance levels in a likelihood framework. Finally, we also 
envision applications to other types of diseases where accu- 
rate estimates for the host population are available (for 
instance, some infectious diseases in humans). 
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